Laying Lexical Foundations for NLP: the Case of Basque at the Ixa Research Group

نویسنده

  • Xabier Artola-Zubillaga
چکیده

The purpose of this paper is to present the strategy and methodology followed at the Ixa NLP Group of the University of The Basque Country in laying the lexical foundations for language processing. Monolingual and bilingual dictionaries, text corpora, and linguists’ knowledge have been the main information sources from which lexical knowledge currently present in our NLP system has been acquired. The main lexical resource we use in research and applications is a lexical database, EDBL, that currently contains more than 80,000 entries richly coded with the lexical information needed in language processing tasks. A Basque wordnet has also been built (it has currently more than 50,000 word senses), although it is not yet fully integrated into the processing chain as EDBL is. Monolingual dictionaries have been exploited in order to obtain knowledge that is currently being integrated into a lexical knowledge base (EEBL). This knowledge base is being connected to the lexical database and to the wordnet. Feedback obtained from users of the first language technology practical application produced by the research group, i.e. a spelling checker, has also been an important source of lexical knowledge that has permitted to improve, correct and update the lexical database. In the paper, doctorate research work on the lexicon finished or in progress at the group is outlined as well, as long as a brief description of the end-user applications produced so far.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reciprocal Enrichment Between Basque Wikipedia and Machine Translation

In this chapter, we define a collaboration framework that enables Wikipedia editors to generate new articles while they help development of Machine Translation (MT) systems by providing post-edition logs. This collaboration framework was tested with editors of Basque Wikipedia. Their post-editing of Computer Science articles has been used to improve the output of a Spanish to Basque MT system c...

متن کامل

IXA Biomedical Translation System at WMT16 Biomedical Translation Task

In this paper we present the system developed at the IXA NLP Group of the University of the Basque Country for the Biomedical Translation Task in the First Conference on Machine Translation (WMT16). For the adaptation of a statistical machine translation system to the biomedical domain, we developed three approaches based on a baseline system for English-Spanish and Spanish-English language pai...

متن کامل

Multilingual, Efficient and Easy NLP Processing with IXA Pipeline

IXA pipeline is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology. It aims at lowering the barriers of using NLP technology both for research purposes and for small industrial developers and SMEs by offering robust and efficient linguistic annotation to both researchers and non-NLP experts. IXA pipeline can be used “as is” or exploit its m...

متن کامل

IXA pipeline: Efficient and Ready to Use Multilingual NLP tools

IXA pipeline is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology. It offers robust and efficient linguistic annotation to both researchers and non-NLP experts with the aim of lowering the barriers of using NLP technology either for research purposes or for small industrial developers and SMEs. IXA pipeline can be used “as is” or exploit i...

متن کامل

Reusability of wide-coverage linguistic resources in the construction of an English-Basque machine translation system

The prototype translates noun and prepositional phrases from English to Basque. It is important to emphasise that the prototype operates with real texts. The treatment of Basque implies to reuse and to adapt wide-coverage linguistic tools and resources for the language developed by our group (IXA group, http://ixa.si.ehu.es); on the other hand, we will take advantage of other tools and resource...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004